Remote Memory Access in Workstation Clusters
نویسندگان
چکیده
Efficient sharing of memory resources in a cluster of workstations has the promise of greatly improving the performance and cost-effectiveness of the cluster when running large memoryintensive jobs. A point of interest is the hardware support required for good memory sharing performance. We evaluate the performance of two models: the software-only model that runs on a traditional distributed system configuration, and requires support from the operating system to access remote memory; and the hardware-intensive model that uses a specialized network interface to extend the memory system to allow direct access to remote memory. Using SimOS, we do a fair comparison of the performance of the two memory-sharing models for a set of interesting compute-server workloads. We find that the software-only model, with current remote page-fault latencies, does not provide acceptable memory-sharing performance. The hardware shared-memory system is able to provide stable performance across a range of latencies. If the remote page-fault latency can be reduced to 100 microseconds, the performance of the softwareonly model becomes acceptable for many, though not all, workloads. Considering the interconnection bandwidth required to sustain the software-only page-level memory sharing, our experiments show that a gigabit network is necessary for good performance.
منابع مشابه
Efficient Support for Multicomputing on ATM Networks
The emergence of a new generation of networks will dramatically increase the attractiveness of loosely-coupled multicomputers based on workstation clusters. The key to achieving high performance in this environment is efficient network access, because the cost of remote access dictates the granularity of parallelism that can be supported. Thus, in addition to traditional distribution mechanisms...
متن کاملExploiting Object Locality in JavaParty, a Distributed Computing Environment for Workstation Clusters
In a distributed programming environment with location transparency, fast access to remote resources is absolutely critical for eÆcient program execution but it is not suÆcient. Locality optimization will try to group objects according to their communication patterns and replace remote access by local access whenever possible. Locality optimization is based on the assumption that local access a...
متن کاملTelegraphos: High-Performance Networking for Parallel Processing on Workstation Clusters
Networks of workstations and high-performance microcomputers have been rarely used for running highperformance applications like multimedia, simulations, scientific and engineering applications,because, although they have significant aggregate computing power, they lack the support for efficient message-passing and shared-memory communication. In this paper we present Telegraphos, a distributed...
متن کاملDodo: A User-level System for Exploiting Idle Memory in Workstation Clusters
In this paper, we present the design and implementation of Dodo, an e cient user-level system for harvesting idle memory in o -the-shelf clusters of workstations. Dodo enables data-intensive applications to use remote memory in a cluster as an intermediate cache between local memory and disk. It requires no modi cations to the operating system and/or processor rmware and is hence portable to mu...
متن کاملComputation-Communication Overlap on Network-of-Workstation Multiprocessors
This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) sharedmemory multiprocessors. The transformation overlaps the communication time resulting form non-local memory accesses with the computation time in parallel loops to effectively hide the latency of the remote accesses. The transformation peels from a...
متن کامل